Tuning the Scheduling of Distributed Stochastic Gradient Descent with Bayesian Optimization

نویسندگان

Valentin Dalibard

Michael Schaarschmidt

Eiko Yoneki

چکیده

We present an optimizer which uses Bayesian optimization to tune the system parameters of distributed stochastic gradient descent (SGD). Given a specific context, our goal is to quickly find efficient configurations which appropriately balance the load between the available machines to minimize the average SGD iteration time. Our experiments consider setups with over thirty parameters. Traditional Bayesian optimization, which uses a Gaussian process as its model, is not well suited to such high dimensional domains. To reduce convergence time, we exploit the available structure. We design a probabilistic model which simulates the behavior of distributed SGD and use it within Bayesian optimization. Our model can exploit many runtime measurements for inference per evaluation of the objective function. Our experiments show that our resulting optimizer converges to efficient configurations within ten iterations, the optimized configurations outperform those found by generic optimizer in thirty iterations by up to 2×.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Unbounded Bayesian Optimization via Regularization

Bayesian optimization has recently emerged as a powerful and flexible tool in machine learning for hyperparameter tuning and more generally for the efficient global optimization of expensive black box functions. The established practice requires a user-defined bounded domain, which is assumed to contain the global optimizer. However, when little is known about the probed objective function, it ...

متن کامل

Elastic Distributed Bayesian Collaborative Filtering

In this paper, we consider learning a Bayesian collaborative filtering model on a shared cluster of commodity machines. Two main challenges arise: (1) How can we parallelize and distribute Bayesian collaborative filtering? (2) How can our distributed inference system handle elasticity events common in a shared, resource managed cluster, including resource ramp-up, preemption, and stragglers? To...

متن کامل

Learning to Learn without Gradient Descent by Gradient Descent

We learn recurrent neural network optimizers trained on simple synthetic functions by gradient descent. We show that these learned optimizers exhibit a remarkable degree of transfer in that they can be used to efficiently optimize a broad range of derivative-free black-box functions, including Gaussian process bandits, simple control objectives, global optimization benchmarks and hyper-paramete...

متن کامل

HAMSI: Distributed Incremental Optimization Algorithm Using Quadratic Approximations for Partially Separable Problems

We present HAMSI, a provably convergent incremental algorithm for solving large-scale partially separable optimization problems that frequently emerge in machine learning and inferential statistics. The algorithm is based on a local quadratic approximation and hence allows incorporating a second order curvature information to speed-up the convergence. Furthermore, HAMSI needs almost no tuning, ...

متن کامل

Optimization of the Microgrid Scheduling with Considering Contingencies in an Uncertainty Environment

In this paper, a stochastic two-stage model is offered for optimization of the day-ahead scheduling of the microgrid. System uncertainties including dispatchable distributed generation and energy storage contingencies are considered in the stochastic model. For handling uncertainties, Monte Carlo simulation is employed for generation several scenarios and then a reduction method is used to decr...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1612.00383 شماره

صفحات -

تاریخ انتشار 2016

Tuning the Scheduling of Distributed Stochastic Gradient Descent with Bayesian Optimization

نویسندگان

چکیده

منابع مشابه

Unbounded Bayesian Optimization via Regularization

Elastic Distributed Bayesian Collaborative Filtering

Learning to Learn without Gradient Descent by Gradient Descent

HAMSI: Distributed Incremental Optimization Algorithm Using Quadratic Approximations for Partially Separable Problems

Optimization of the Microgrid Scheduling with Considering Contingencies in an Uncertainty Environment

عنوان ژورنال:

اشتراک گذاری